Longest Common Subsequence in k Length Substrings

نویسندگان

  • Gary Benson
  • Avivit Levy
  • B. Riva Shalom
چکیده

In this paper we define a new problem, motivated by computational biology, LCSk aiming at finding the maximal number of k length substrings, matching in both input strings while preserving their order of appearance. The traditional LCS definition is a special case of our problem, where k = 1. We provide an algorithm, solving the general case in O(n) time, where n is the length of the input strings, equaling the time required for the special case of k = 1. The space requirement of the algorithm is O(kn). We also define a complementary EDk distance measure and show that EDk(A,B) can be computed in O(nm) time and O(km) space, where m, n are the lengths of the input sequences A and B respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient algorithms for the longest common subsequence in $k$-length substrings

Finding the longest common subsequence in k-length substrings (LCSk) is a recently proposed problem motivated by computational biology. This is a generalization of the well-known LCS problem in which matching symbols from two sequences A and B are replaced with matching non-overlapping substrings of length k from A and B. We propose several algorithms for LCSk, being non-trivial incarnations of...

متن کامل

Longest Common Subsequence in at Least k Length Order-Isomorphic Substrings

We consider the longest common subsequence (LCS) problem with the restriction that the common subsequence is required to consist of at least k length substrings. First, we show an O(mn) time algorithm for the problem which gives a better worst-case running time than existing algorithms, where m and n are lengths of the input strings. Furthermore, we mainly consider the LCS in at least k length ...

متن کامل

Sparse Dynamic Programming for Longest Common Subsequence from Fragments

Sparse Dynamic Programming has emerged as an essential tool for the design of efficient algorithms for optimization problems coming from such diverse areas as computer science, computational biology, and speech recognition. We provide a new sparse dynamic programming technique that extends the Hunt–Szymanski paradigm for the computation of the longest common subsequence (LCS) and apply it to so...

متن کامل

A BSP/CGM Algorithm for the All-Substrings Longest Common Subsequence Problem

Given two strings X and Y of lengths m and n, respectively, the all-substrings longest common subsequence (ALCS) problem obtains the lengths of the subsequences common to X and any substring of Y . The sequential algorithm takes O(mn) time and O(n) space. We present a parallel algorithm for ALCS on a coarse-grained multicomputer (BSP/CGM) model with p < p m processors that takes O(mn=p) time an...

متن کامل

Subsequence Combinatorics and Applications to Microarray Production, DNA Sequencing and Chaining Algorithms

We investigate combinatorial enumeration problems related to subsequences of strings; in contrast to substrings, subsequences need not be contiguous. For a finite alphabet Σ, the following three problems are solved. (1) Number of distinct subsequences: Given a sequence s ∈ Σ and a nonnegative integer k ≤ n, how many distinct subsequences of length k does s contain? A previous result by Chase st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013